Cost Estimation Techniques for Database Systems
نویسنده
چکیده
This dissertation is about developing advanced selectivity and cost estimation techniques for query optimization in database systems. It addresses the following three issues related to current trends in database research: estimating the cost of spatial selections, building histograms without looking at data, and estimating the selectivity of XML path expressions. The first part of this dissertation deals with estimating the cost of spatial selections, or window queries, where the query windows and the data objects are general polygons. Previously proposed cost estimation techniques only handle rectangular query windows over rectangular data objects, thus ignoring the significant cost of exact geometry comparison (the refinement step in a “filter and refine” query processing strategy). The cost of the exact geometry comparison depends on the selectivity of the filtering step and the average number of vertices in the candidate objects identified by this step. We develop a cost model for spatial selections that takes these parameters into account. We also introduce a new type of histogram for spatial data that captures the size, location, and number of vertices of the spatial objects. Capturing these attributes makes this type of histogram useful for accurate cost estimation using our cost model, as we experimentally demonstrate. The second part of the dissertation introduces self-tuning histograms. While similar in structure to traditional histograms, self-tuning histograms are built not by examining the data or a sample thereof, but by using feedback from the query execution engine about the selectivities of range selections on the histogram attributes to progressively refine the histogram. Since self-tuning histograms have a low up-front cost and the cost of building them is independent of the data size, they are an attractive alternative to traditional histograms, especially multidimensional histograms. The low cost of self-tuning histograms can help a self-tuning selfadministering database system experiment with building many different histograms on many different combinations of data columns. This is useful since the system cannot rely on a database administrator to decide which histograms to build.
منابع مشابه
Using Neural Networks with Limited Data to Estimate Manufacturing Cost
Neural networks were used to estimate the cost of jet engine components, specifically shafts and cases. The neural network process was compared with results produced by the current conventional cost estimation software and linear regression methods. Due to the complex nature of the parts and the limited amount of information available, data expansion techniques such as doubling-data and data-cr...
متن کاملTire Inflation Pressure Estimation Using Identification Techniques
In this research study, one of the most crucial automotive engineering problems is intended to be solved. The necessity of tire pressure monitoring system is beyond doubt. Such systems are now provided relying on expensive sensors. In this study an indirect tire pressure monitoring system is proposed, utilizing identification techniques, which will reduce the cost of monitoring considerably in ...
متن کاملQuery Result Size Estimation Techniques in Database Systems
Query optimisers are critical to the efficiency of modern relational database systems. If a query optimiser chooses a poor query execution plan, the performance of the database system in answering the query can be very poor. In fact, the differences in cost between the least and most expensive query execution plans can be several orders of magnitude. On the other hand, it can be prohibitively e...
متن کاملWavelet-Based Cost Estimation for Spatial Queries
Query cost estimation is an important and well-studied problem in relational database systems. In this paper we study the cost estimation problem in the context of spatial database systems. We introduce a new method that provides accurate cost estimation for spatial selections, or window queries, by building wavelet-based histograms for spatial data. Our method is based upon two novel technique...
متن کاملCost Modeling and Range Estimation for Top-k Retrieval in Relational Databases
Relational databases have increasingly become the basis for a wide range of applications that require efficient methods for exploratory search and retrieval. Top-k retrieval addresses this need and involves finding a limited number of records whose attribute values are the closest to those specified in a query. One of the approaches in the recent literature is query-mapping which deals with con...
متن کامل